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This  paper  will  report  the  results  of  an  investigation 
of  a  particular  coding  method  for  continuous  channels.   Specifi- 
cally, binary  linear  codes  (group  codes)  are  employed  as  a  coding 
method  for  the  inputs  to  a  time-discrete  continuous  channel. 
This  channel  is  presumed  to  be  perturbed  only  by  additive  noise 
with  a  Gaussian  amplitude  distribution  which  affects  each  trans- 
mitted digit  independently. 

Shannon  has  obtained  bounds  on  the  optimum  probability  of 
error  when  the  input  signals  are  considered  to  be  sequences  of 
n  real  numbers,  subject  only  to  the  constraint  that  the  signal 
power  in  each  sequence  be  a  constant.   This  is  a  nominal  restric- 
tion, resulting  in  a  very  general  theory.   The  use  of  group  codes 
for  the  input  signal  sequences  restricts  the  individual  numbers 
in  the  n-nuraber  sequence  to  take  on  only  one  of  two  distinct 
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values  and  further  requires  that  the  input  sequences  be  capable 
of  being  placed  into  one-to-one  correspondence  with  a  group  code. 
This  is  more  restrictive  than  the  general  case  but  has  the  advan- 
tage of  being  a  constructive  method  of  establishing  the  input 
sequences. 

Four  bounds,  two  upper  and  two  lower,  on  the  reliability 
of  such  a  coding  method  are  derived.   The  upper  bounds  show  that 
the  reliability  of  the  binary  coding  is  bounded  away  from  the 
optimum  for  some  ranges  of  the  transmission  rate,   The  range  of 
rate  over  which  these  bounds  give  useful  information  is  a  func- 
tion of  the  signal  power  to  noise  power  ratio.   The  lower  bounds 
on  reliability  for  the  binary  coding  technique  are  below  those 
for  the  general  case  for  all  ranges  of  transmission  rate.   This 
is  an  expected  result,  since  the  binary  case  is  a  special  case  of 
the  general  theory   For  a  reasonably  broad  range  of  transmission 
rate,  however,  the  lower  bounds  are  quite  close  together,  indi- 
cating that  the  binary  case  can  guarantee  reliability  only 
slightly  worse  than  that  guaranteed  by  the  general  technique. 
The  results  of  the  investigation  show  that  the  use  of  group 
codes  as  input  signal  sequences  is  promising. 
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SECTION  I 


INTRODUCTION 


A.   The  Coding  Problem 

Shannon  (J.)*  has  established  that  it  is  possible  to  transmit 
information  from  a  source  to  a  receiver  over  a  communications  channel 
in  such  a  way  that  the  probability  that  an  error  will  occur  can  be 
made  as  small  as  desired,  provided  that  the  rate  of  this  information 
transmission  does  not  exceed  a  value  called  the  channel  capacity. 
This  startling  and  not  at  all  obvious  result  is  achieved  by  asso- 
ciating with  the  information  to  be  transmitted  additional  quantities 
of  data  which  serve  to  detect  and  to  correct  errors  introduced  by  a 
noisy  channel.   Shannon's  classic  results,  while  demonstrating  that 
such  a  performance  is  possible,  are  in  effect  existence  theorems 
which  do  not  provide  constructive  means  for  achieving  this  reliable 
information  transmission.   The  coding  problem  is  that  of  devising 
methods  of  appending  redundant  data  to  desired  information  in  order 
to  achieve  these  results  which  are  known  to  be  possible,, 

The  purpose  of  introducing  redundancy  into  messages  by 
coding  is  to  combat  the  effects  of  the  noise  which  is  present  in 


♦Underlined  numbers  in  parentheses  refer  to  the  List  of 
References  at  the  end  of  this  paper. 


the  transmission  medium  or  channel.   It  follows,  then,  that  work  in 
coding  theory  has  been  broadly  classified  according  to  the  type  of 
channel  over  which  communication  is  to  take  place. 

In  general,  a  channel  may  be  considered  as  a  device  which 
transforms  successive  input  events,  each  represented  by  a  point  x 
of  an  input  space  X,  into  output  events,  each  represented  by  a 
point  y  of  an  output  space  Y.   This  transformation  of  x  to  y  is 
governed  by  a  conditional  probability  distribution  P(Y/X)  which  is 
determined  by  the  noise  in  the  channel. 

Channels  are  usually  classified  according  to  the  types  of 
the  input  and  output  spaces.   If  the  input  and  output  spaces  are 
discrete,  the  channel  is  said  to  be  discrete.   If  the  input  and  out- 
put spaces  are  continuous,  the  channel  is  said  to  be  continuous. 
Discrete-to-continuous  and  continuous-to-discrete  channels  would 
also  be  possibilities. 

Successive  events  in  a  discrete  channel  form  a  time-discrete 
sequence.   However,  two  possibilities  arise  in  the  case  of  a 
continuous  space.   A  point  representing  an  event  may  be  allowed  to 
change  only  at  specified  instants  of  time.   If  the  channel  has  input 
and  output  spaces  of  this  type,  it  is  said  to  be  time-discrete  with 
continuous  amplitudes.   Alternatively,  the  point  representing  an 
event  may  be  free  to  change  its  value  at  any  time,  i.e.,  to  move 
continuously.   If  the  channel  has  input  and  output  spaces  of  this 
type,  it  is  said  to  be  time-continuous  with  continuous  amplitudes. 
Coding  theory  may  then  be  classified  according  to  its  application  in 


discrete  channels,  time-discrete  channels  with  continuous  amplitudes 
or  time-continuous  channels  with  continuous  amplitudes. 

Research  in  coding  for  the  discrete  channel  has  been  largely 
centered  on  binary  channels,  wherein  the  input  event  or  signal  can 
have  only  two  states.   Results  in  this  area  have  been  voluminous, 
both  as  to  coding  methods  and  to  evaluation  of  possible  ranges  and 
bounds  on  error  probabilities  for  various  classes  of  discrete  codes. 
Peterson  (2)    presents  a  comprehensive  study  of  current  practices  and 
the  present  state  of  research  in  discrete  channel  coding  theory.   As 
an  indication  of  the  complexity  of  this  problem,  it  is  significant 
that  even  today  Elias'  Error-Free  Coding  (2)  is  the  only  known 
example  of  a  constructive  error  coding  technique  which  permits  the 
realization  of  an  error  probability  approaching  zero  as  we  increase 
without  limit  the  size  of  the  information  blocks  transmitted,  while 
maintaining  a  non-zero  information  rate.   Even  this  method  requires 
that  information  be  transmitted  at  a  rate  well  below  channel  capacity, 
which  is  the  theoretical  upper  limit. 

The  search  for  techniques  for  incorporation  of  redundancy 
into  signals  for  continuous  channels,  both  time-discrete  and  time- 
continuous,  has  proceeded  largely  under  the  name  of  signal  design, 
rather  than  coding.   Much  of  the  error-correcting  techniques 
considered  have  been  correlation  methods.   While  the  object  is  the 
same  as  coding  for  discrete  channels,  the  diversity  of  the  mathemati- 
cal and  conceptual  techniques  has  resulted  in  relatively  little 
interplay  between  coding  theory  and  signal  design. 


The  application  of  coding  theory  to  continuous  channels 
has  been  restricted  almost  exclusively  to  time-discrete  channels 
with  continuous  amplitudes.   Shannon  (4)  has  obtained  results 
giving  the  possible  limits  on  error  probabilities  for  a  particular 
type  of  input  space.   Franco  and  Lachs  (5)    and  Harmuth  (6_)  have 
investigated  the  use  of  orthogonal  functions  as  signals  for  a  time- 
discrete  channel  in  a  manner  which  is  reminiscent  of  discrete 
coding  theory 

There  has  been  to  the  author's  knowledge  very  little  work 
done  on  coding  for  time-continuous  channels  with  continuous  ampli- 
tudes except  for  calculations  of  channel  capacity,  typically  by 
Fano  (]_)  .      There  are  two  basic  reasons  for  this.   First,  the  transi- 
tion from  time-discrete  to  time-continuous  channels  is  mathemati- 
cally a  formidable  step.   Second,  while  many  channels  which  are  of 
practical  interest  are  of  the  time-continuous  type,  they  may  in 
many  cases  be  represented  to  a  satisfactory  degree  of  accuracy  as 
time-discrete  channels,  usually  through  sampling  techniques. 

B„   The  Investigation 

This  paper  will  report  the  results  of  an  investigation  of  a 
particular  coding  method  for  time-discrete  channels  with  continuous 
amplitudes.   Specifically,  binary  linear  codes  (group  codes)  will 
be  employed  as  a  coding  method  for  the  inputs  to  a  continuous 
channel  which  is  presumed  to  be  perturbed  only  by  additive  noise 
with  a  Gaussian  amplitude  distribution  which  affects  each  trans- 
mitted digit  independently.   The  signals  and  the  channel  will  be 
more  precisely  defined  in  a  later  section  of  this  paper. 


Shannon  (4)  has  obtained  results  for  a  channel  of  this  type 
when  the  input  signals  are  considered  to  be  sequences  of  n  real 
numbers,  subject  only  to  the  constraint  that  the  signal  power  in 
each  sequence  be  a  constant.   This  is  a  nominal  constraint,  resulting 
in  a  very  general  theory  for  this  type  of  channel.   The  use  of  group 
codes  for  the  input  signal  sequences  will  restrict  the  individual 
numbers  in  the  n-number  sequences  to  take  on  only  one  of  two  distinct 
values,  and  will  further  require  that  the  input  sequences  be  capable 
of  being  placed  into  one-to-one  correspondence  with  a  group  code. 
This  method  is  much  more  constrained  than  the  largely  unrestricted 
signal  sequences  allowed  in  the  work  of  Shannon.    It  has,  however, 
the  advantage  of  being  a  constructive  method  of  establishing  the  input 
sequences.   The  bulk  of  the  investigation  is  then  an  establishing  of 
comparative  results  between  the  group  code  method  and  the  unrestricted 
theory. 

Specifically,  it  is  desired  to  compare  the  reliability  of  the  two 
methods.   Reliability  has  a  precise  definition  which  will  be  given 
later  but  it  is  in  essence  a  measure  of  the  error-correcting  capa- 
bilities of  a  code.   Exact  figures  for  reliability  are  generally  not 
obtainable  when  a  large  class  of  codes  is  considered  due  to  the  many 
different  codes  within  a  class  and  also  to  inherent  mathematical  dif- 
ficulties.  Instead,  upper  and  lower  bounds  on  reliability  are  commonly 
obtained.   For  a  specified  code  length  an  estimate  or  bound  on  reli- 
ability does  not  give  a  very  accurate  determination  of  the  probability 


of  error  of  the  code.   However,  given  a  desired  level  of  error  proba- 
bility, a  knowledge  of  reliability  will  permit  a  reasonably  sharp 
estimate  of  the  required  length  of  the  code.   This  is  often  the 
actual  problem  faced  in  coding  applications. 

Shannon's  unrestricted  results  contain  four  bounds  on  reliability, 
two  upper  bounds  and  two  lower  bounds.   In  this  investigation,  two  new 
upper  and  two  new  lower  bounds  on  reliability  are  presented.   These 
bounds  give  a  measure  of  the  loss  in  reliability  incurred  when  the 
restrictive  encoding  method  using  group  codes  is  employed.   Certain 
allied  results  obtained  in  the  investigation  will  also  be  presented. 

C.    The  Plan  of  This  Paper 

This  paper  contains  five  numbered  sections  and  an  appendix. 
The  first  section  is  this  Introduction.   Section  II  defines  the 
channel  and  the  encoding  technique.   Section  III  presents  the  derivation 
of  the  bounds.   Section  IV  compares  the  results  of  this  investigation 
with  the  unrestricted  case.   Section  V  contains  conclusions  and 
recommendations  for  further  work.   Allied  results  are  in  the  Appendix. 


SECTION  II 
THE  COMMUNICATION  SYSTEM  DEFINED 
A.   The  Channel 

The  type  of  communication  channel  with  which  this  Investi- 
gation is  concerned  is  termed  a  "continuous,  time-discrete" 
channel.   Fano  (8)  describes  a  continuous,  tirae-discrete  channel 
as  one  wherein  the  input  and  output  events  are  represented  by 
points  of  continuous,  Euclidean  space,  but  these  points  are  per- 
mitted to  change  their  positions  only  at  specified  time  Instants. 
For  simplicity,  it  may  be  assumed  that  the  input  changes  once  each 
second,  and  that  at  any  given  time  the  input  consists  of  a  real 
number.   The  Input,  then,  is  a  sequence  of  real  numbers  which  change 
once  each  second.   The  1   real  number  will  be  denoted  u.. 

The  channel  is  assumed  to  be  perturbed  by  an  additive 
noise,  whose  amplitude  has  a  Gaussian  distribution  and  which  affects 
each  u   independently.   At  the  receiver,  then,  the  i   real  number 
will  be  observed  as  u.  +  n  ,  where  the  n  are  independent  Gaussian 
random  variables,  all  of  which  are  assumed  to  have  the  same 
variance  N.   The  assumption  that  the  noise  affecting  the  channel 
is  of  this  type  results  in  an  admittedly  highly  idealized  channel. 
It  is  felt,  however,  that  understanding  this  channel  thoroughly 
will  be  very  helpful  if  more  difficult  generalizations  are  attempted. 


In  this  investigation,  the  values  of  the  u  are  restricted 
to  be  one  of  two  distinct  numbers,   These  numbers  could  be  arbi- 
trarily chosen,  but  it  is  shown  in  the  Appendix,  Section  A,  that 
because  of  the  structure  of  group  codes,  minimum  signal  power  will 
result  when  these  numbers  are  chosen  as  +B,  where  B  is  a  real 
number.   Hence,  the  i   real  number  observed  at  the  receiver  will 
have  the  form  +B  +  n  . 

This  channel  may  be  considered  to  be  a  form  of  sampled 
data  communication.   However,  the  arguments  to  be  used  in  developing 
the  bounds  on  reliability  will  be  largely  geometric   No  considera- 
tion will  be  given  to  the  origin  of  these  inputs  or  to  the  allied 
question  of  whether  a  continuous  function  of  time  can  be  adequately 
represented  in  the  form  given  above.   These  problems  arise  in  the 
application  of  this  channel  in  a  sampled  system, 

B.   The  Code 

The  sequences  of  real  numbers  used  as  inputs  to  the  channel 
will  be  arranged  as  a  block  code.   A  block  code  is  a  code  that  uses 
sequences  of  n  symbols  or  n-tuples.   Each  sequence  of  n  symbols  is 
termed  a  code  word  or  code  block.   With  the  restriction  that  each 
input  symbol  may  take  on  only  the  values  +B,  there  are  2   n-tuples 
which  could  be  used.   Of  this  total  number,  only  M  of  these  will 
be  used  as  code  words.   In  this  paper,  it  will  be  further  required 
that  each  ensemble  of  M  code  words  be  in  one-to-one  correspondence 
with  a  binary  linear  code  or  group  code.   Formally,  this  corre- 
spondence can  be  achieved  by  mapping  a  binary  linear  code  into 


the  set  of  all  n-tuples  containing  +B  as  elements,  where  "1"  is 
mapped  into  "B,"  and  "0"  is  mapped  into  "-B."   The  resulting  subset 
of  n-tuples  with  +B  as  elements  is  then  in  one-to-one  correspond- 
ence with  the  binary  linear  code. 

In  the  notation  of  group  codes,  an  (n,k)  code  is  a  code 
of  length  n  in  which  each  code  word  contains  k  information  digits 
and  (n-k)  redundancy  digits.   There  are  M=2   code  words  in  an 
(n,k)  code 

A  convenient  interpretation  of  the  ensemble  of  M  code  words 
is  that  they  represent  M  points  in  n-dimensional   Euclidean  space. 
The  origin  of  coordinates  in  this  space  is  the  zero  vector,  and 
a  typical  point  of  the  ensemble  might  be  B,  -B,  B,  -B,  •  •  •. 
The  points  in  n-  space  whose  coordinates  consist  of  either  B  or  -B 
will  be  called  "binary  points."  Since  coding  involves  the  selec- 
tion of  M  of  the  possible  2n  binary  points,  each  code  word  will  be 
a  binary  point,  while  the  converse  is  not  true.   Each  binary  point 
is  at  the  same  distance  from  the  origin,  since  in  n-space  (9)  the 
length  R  of  a  line  is  given  by 


n 

i 


R  =   Z  (h  -  Y-> 

i=l 


where  X.  and  Y.  are  the  i   components  of  the  vectors  X  and  Y, 
respectively.   Consider  Y  to  be  the  origin  and  let  X  be  any 
signal  vector.   Then 


10 


R2  =   [B2=  nB2 
i=l 


Thus,  each  binary  point  lies  on  the  surface  of  an  n-dimensional 
space  of  radius  BVn. 

A  motivation  for  the  terminology  "unrestricted  case"  as 
applied  to  the  case  treated  by  Shannon  (4)  is  apparent  in  the  fore- 
going discussion.   When  input  signal  sequences  are  permitted  to 
assume  any  value  subject  to  the  constraint  of  constant  power  for 
each  sequence,  it  can  be  shown  that  these  sequences  may  lie  any- 
where on  the  surface  of  an  n-dimensional  sphere.   When  only  two 
levels  are  used,  however,  the  sequences  are  restricted  to  the 
binary  points. 

C   The  Noise  Distribution 

The  noise  present  in  the  channel  may  also  be  given  a  geo- 
metrical interpretation.   Each  coordinate  of  the  signal  vector  is 
affected  independently  by  an  additive  noise  n  ,  whose  amplitude  is 
Gaussianly  distributed.   At  any  given  signal  point,  each  n^  is 
Gaussian  with  zero  mean  and  variance  N.   The  displacement  of  any 
signal  point  from  its  original  position  due  to  noise  can  then  be 
considered  to  be  a  random  variable  n  =  n, ,  n„,  ■  ■  '  n  ,  where  each 
n.  is  independent  and  has  a  one  dimensional  Gaussian  distribution. 
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Cramer  (10)  shows  that  such  a  random  variable  has  a  probability 
distribution  of  the  general  form 


P(n)  =  — 


nTTZ — 
(2TT)    YA 


where  A  is  the  determinant  of  the 


1   r 

T7  £-»    ai,  n  n 

2A  jTk  Jk  J   k 


moment  matrix 


(1) 


A  = 


'11 


*nl 


aln 

a„„ 
nn_j 


(2) 


In  this  notation,  a^  is  the  variance  of  the  n,,  given  by 

aii =  E[<ni  ■  V2] 


-  covariance  between  n£  and  n  ,  given  by 


and  a#   is  the 

•ij  -  E  [  (n± 


mi)(nj 


•J>] 


where  ^  is  the  mean  of  each  a±   and  E  is  the  usual  notation  for 
average  or  expected  value.   By  hypothesis,  each  ^    is  zero 
and  each  variance  is  N. 

The  form  of  the  probability  distribution  of  n  can  be 
obtained  by  determining  the  probability  distribution  of  the 

standard  variable  t   =   t        t        •    •  •  * 

V  C2,        tn,  where  t±   =  (^  -  m^^ 

=  n^VN,  s£  being  the  standard  deviation  of  n 
Then 

a- = e  fell-  i  'M-  • 


and 
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au"E 


"cy  fi 


-  i  E(nt)  E(nj)  -  0,  t  +  i, 


since  all  the  n.  are  assumed  to  be  Independent.   The  moment  matrix 
A  then  becomes  the  unit  matrix  and  its  determinant  A  is  equal  to  1. 
In  (1),  all  terms  in  the  exponent  vanish  except  those  where  j  =  k, 
giving 


P(t)  - 


-1   Et2 

2   V  i 


(2TT) 


n/2 


In  n-space,  the  magnitude  of  a  radius  vector  R  from  the 


origin  is  given  by  |R| 


th 


)  ,  r.,  where  r   is  the  i   corapone 
of  R.   Let  the  magnitude  of  t  be  t.   Then 


nt 


p(t)  = 


(2TT) 


n/2 


It2 
2 


(3) 


Thus,  the  probability  distribution  of  the  noise  displacement  is 
independent  of  direction  of  displacement.   For  this  type  distribu- 
tion, the  contours  of  equiprobable  surfaces  are  given  by 


k  I V  'j  H 


-  c2 


where  A   is  the  cofactor  of  a   in  A,   Since  A  (the  determinant 
jk  jk 

of  A)  =  1,  and 


v=  < 


1.  i  -  J 

0,  i  *  j 
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the  surfaces  of  equal  probability  are  given  by 

i  y    t2  =  c2 

2  L,        i 
i=l 

which  is  the  parametric  equation  for  a  sphere  in  n-dimensions . 
Hence  (3)  may  be  termed  a  spherical  Gaussian  distribution.   The 
probability  distribution  of  displacement  by  noise  in  any  given  direc- 
tion is  independent  of  that  direction  and  is  a  one-dimensional 
Gaussian  distribution. 

D,   The  Decoding  System 

The  encoding  and  communication  process  has  been  characterized 

geometrically  as  the  selection  of  points  on  the  surface  of  an 

n-dimensional  sphere,  which,  in  the  transmission  over  the  channel, 

are  displaced  from  their  original  location  by  noise  that  has  a 

spherical  Gaussian  distribution.   A  decoding  system  for  such  a  model 

is  a  partitioning  of  the  n-space  into  M  subsets  corresponding  to 

the  transmitted  messages.   This  is  a  method  of  deciding,  at  the 

receiver,  which  message  was  transmitted.   If  the  received  message 

is  in  the  subset  corresponding  to  the  i   transmitted  code  word, 

then  it  is  presumed  that  it  was  the  i   code  word  which  was  sent. 

For  any  code,  the  probability  of  error  is  defined  as 
M 


P 
e 


i=l  r  l 
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where  p.  Is  the  probability  that  the  1   message  will  be  trans- 
mitted, and  q   is  the  probability  that  if  code  word  i  is  sent, 
it  will  be  decoded  incorrectly,  i  e.  ,  as  a  code  word  other  than  i. 
For  the  purposes  of  this  investigation,  it  is  assumed  that  all  code 
words  are  equally  likely  to  be  transmitted,  so  that  for  a  code  of 

M  words 

M 

Pe  =  h      £  % 
6   M  i=l   i 

An  optimal  decoding  system  for  a  code  is  one  which  minimizes 
the  probability  of  error.   The  Gaussian  density  function  is 
monotone  decreasing  with  distance .   The  greater  the  displacement 
of  a  point  from  its  original  position,  the  less  probable  is  that 
displacement.   With  this  noise  distribution,  an  optimal  decoding 
system  is  one  which  decodes  any  received  signal  as  being  the  code 
word  corresponding  to  the  geometrically  closest  code  word  location. 
This  type  of  decoding  is  called  minimum  distance  or  maximum  likeli- 
hood decoding.   This  decoding  system  is  assumed  to  be  used  through- 
out the  investigation  reported  in  this  paper. 

One  additional  comment  pertinent  to  decoding  is  offered. 
As  noted,  the  probability  of  error  will  depend  on  the  geometrical 
distance  between  code  words.   It  would  be  possible,  for  a  fixed 
value  of  noise  variance  N,  to  decrease  the  probability  of  error  by 
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placing  the  code  words  far  enough  apart  In  n-space.   For  a  fixed  N 
and  code  length  n,  this  would  correspond  to  Increasing  the  signal 
level  B,   Loosely  speaking,  this  is  equivalent  to  Increasing  the 
signal  to  noise  ratio,  which  one  expects  to  result  in  more  reliable 
communications.   Thus,  B  will  necessarily  remain  as  a  parameter 
in  error  calculations^ 

E0   Euclidean  Distance  as  Related  to  Hamming  Distance 

A  fundamental  parameter  of  discrete  codes  is  Hamming 
distance,  usually  denoted  by  d.   Hamming  distance  is  defined  to 
be  the  number  of  digits  in  which  two  code  words  differ.   For  the 
purpose  of  this  investigation,  a  relation  between  Hamming  and 
Euclidean  distance  is  required. 

Consider  two  points  in  Euclidean  n-space,  wj  =  s^,  s 

and  v   =  t  .  t  ,  *  *  "  t  .   The  distance  D  between  these  two  points 
2    12         n 

is  given  by 

n  2 

D  =  £  <si  -  tt>  - 

i=l 

If  these  two  points  are  binary  points  as  defined  in  Part  B  above, 

2  2 

then  (s^  -  t  )   can  have  only  the  values  0  or  4B  ,  depending  on 

whether  s  =  t.  or  s  ^  t.   Now,  w   and  w   may  also  be  considered 
i    i     i    i  1       2 

to  be  code  words.   If  the  Hamming  distance  between  these  two  words 
is  d,  then  8  j4  t   in  d  places.   Hence 


sn> 


16 


2      2 


which  is  the  necessary  relation  between  Hamming  and  Euclidean 
distance. 


SECTION  III 
BOUNDS  ON  RELIABILITY 
A.   The  Concept  of  a  Bound 

The  evaluation  of  a  particular  coding  technique  for  a 
communication  channel  is  ideally  done  by  calculating  the  proba- 
bility of  error,  Pe,  for  that  technique.   The  P   is  more  properly 
written  Pe(N,B,n,R)  to  show  that  error  probability  is  a  function 
of  noise  power  N,  signal  level  B,  code  word  length  n,  and  informa- 
tion rate  R   Determination  of  an  exact  P  may  be  impossible,  or 
mathematically  quite  complex.   Consequently,  it  is  necessary  to 
resort  to  bounds  on  Pg  rather  than  an  exact  result.   The  bounds 
usually  derived  on  P£  are  functions  g  which  permit  the  inequality 
*1  «  Pe  S  82 

to  be  written.   The  functions  gL  and  g2  can  all  be  placed  in  the 
form 

-nE(R)  +  o(n) 
6  (1) 

where  R  is  a  function  of  B  and  N,   Here  o(n)  is  a  term  of  order 

less  than  n. 

This  investigation  is  concerned  with  the  use  of  group  codes 
as  input  signal  sequences.   For  this  class  of  codes,  there  are  a 
finite  number  of  codes.   Hence,  there  is  a  best  code  in  the  sense 
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that  some  code  has  a  P  which  is  no  larger  than  any  other  code. 

It  is  the  Pe  for  this  best  code  in  the  class  of  group  codes  that 

will  be  bounded. 

To  further  simplify  the  mathematical  operations,  this  report 

will  be  concerned  primarily  with  determination  of  bounds  on  E(R), 

which  is  called  reliability.   If  the  bounds  developed  on  P  are 

e 

placed  in  the  form  of  (1),  then  E(R),  or  simply  E,  is  defined  as 

E  =  lim   -  I  loge  Pe  . 

n  -»oo    n 

E  is  then  independent  of  code  word  length  n,  which  permits  a 

simplified  presentation  of  results.   E  is  a  measure  of  how  fast  the 

probability  of  error  goes  to  zero.   In  this  connection,  it  should 

be  noted  that  the  exponent  in  the  defining  equation  (1)  for  E  is 

negative   Hence,  a  lower  bound  on  E  will  correspond  to  an  upper 

bound  on  P  » 
e 

Knowledge  of  E  and  n  will  not  permit  the  close  determination 

of  P   from  (1),  since  the  term  o(n)  could  be  a  large  multiplier, 
e 

However,  given  E,  and  the  P  which  is  desired,  the  necessary  value 

e 

of  n  can  be  determined  fairly  sharply  when  n  is  large.   In  fact, 

n  will  be  asymptotic  to  -  —   log-  P..   In  applications  of  coding 

E    e  e 

theory,  this  is  normally  the  natural  problem  i.e.,  how  long  must  the 
code  be  to  achieve  a  given  level  of  P  „ 
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Bo   First  Upper  Bound  on  E(R) 

Plotkin  (11)  has  obtained  a  bound  on  minimum  Hamming 
distance  which  is  applicable  to  binary  codes  in  general  and  to 
binary  linear  codes,  which  are  a  subclass  of  binary  codes.   This 
bound  on  distance  may  be  used  as  the  origin  of  a  bound  on  P   in 
binary  coding  for  continuous  channels.   For  this  purpose,  the  most 
convenient  form  of  the  bound  is  given  by  Peterson  (12)  as  follows: 

"Consider  an  n-symbol  linear  code  with  symbols  taken  from 
the  field  of  q  elements.   Let  k  be  the  number  of  information 
symbols  and  (n-k)  the  number  of  check  symbols.   If 


qd  -  1 

n  2 


then 

k<  n  -  qq  ~_    \   +  1  +  logq  d 

where  d  is  the  minimum  Hamming  distance  between  code  words." 

Since  this  investigation  is  concerned  with  binary  coding, 

q  =  2.   Then,  for  n  2  2d  -  1,  the  bound  may  be  written  as 
k<n-2d  +  2  +  log2  d 

or 

n  -  k  2  2d  -  log2  d  -  2,  (1) 

Future  calculations  will  be  simplified  if  the  bound  on 
minimum  distance  as  given  by  (1)  is  changed  so  that  d  does  not 
appear  as  the  argument  of  a  logarithm   Note  that 
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dS2d- 
for  any  positive  integer  value  of  d.   Hence 

log2  d<  d  -  1. 
Then,  from  (1) 

n-kS2d-(d-l)-2  =  d-l 
so  that 

d<  n  -  k  +  1„ 
This  may  be  substituted  in  (1)  to  yield 

n  •  k  2  2d  •  log2  (n  -  k  +  1)  -  2 
or  \ 

dS|  Tn  -  k  +  log2  (n  -  k  +  1)  +  2]  .  (2) 

Equation  (2)  bounds  the  Hamming  distance  of  a  group  code 

in  terms  of  code  length  n  and  the  number  of  information  digits  k, 

A  more  desirable  form  of  (2)  (for  the  purpose  here)  is  one  in  which 

transmission  rate  R  or  the  number  of  code  words  M  appears  explicitly. 

Rate,  as  used  in  the  coding  literature,  is  variously  defined,   A 

common  definition  is  that  rate  R  is  the  ratio  of  the  average  number 

of  information  symbols  per  code  word  to  the  average  number  of  total 

symbols  per  word  (word  length).   Specifically,  for  an  (n,k)  group 

code,  rate  is  equal  to  k/n.   Since  there  are  M  =  2   code  words  in  a 

group  code,  rate  can  also  be  written  as  i  log™  M.   Implicit  in  these 

n    z 

definitions,  however,  is  the  idea  that  rate  as  used  in  coding 
theory  is  generally  concerned  with  what  a  source  (or  code)  is 
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capable  of  transmitting,  rather  than  what  it  is  actually  trans- 
mitting.  For  general  use,  a  definition  of  rate  should  incorporate 
this  maximal  concept.   This  may  be  done  by  using  the  concept  of 
self  information. 

Assume  that  there  exists  an  ensemble  of  events  X  (such  as 
the  ensemble  of  M  code  words) ,  each  of  which  occurs  with  a  proba- 
bility p(x.),  where  x.  is  the  i  event.  The  self  information  of 
x   is  defined  as 

I(xi)  =  -log  p(xt) . 
The  units  of  I  are  determined  by  the  base  of  logarithms  chosen. 
Three  commonly  used  units  are  bits  for  logarithms  to  the  base  2, 
nats  for  natural  or  naperian  logarithms,  and  hartleys  when  the 
base  10  is  used.   Self  information  may  be  interpreted  as  the  maximum 
amount  of  information  that  can  possibly  be  provided  about  the 

event  x. . 

l 

The  desired  definition  of  R  is  one  which  will  specify  condi- 
tions under  which  the  maximum  average  amount  of  information  per  code 
word  symbol  is  transmitted.   The  total  self  information  of  the 
source  is  simply  the  sum  of  the  self  information  of  each  event.   The 
rate  R  can  then  be  defined  as  the  maximum  of 
M 

i  y  p(xt>  i(Xl). 
n  t=i 

If  the  definition  of  I  is  used,  this  can  be  written  as 

M 

-  I  T    p(x.)  log  p(Xi)  (3) 
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which  is  recognized  as  the  entropy  function  of  information  theory, 

modified  by  the  factor  1/n.   It  is  known  that  the  entropy  function 

achieves  a  maximum  when  all  the  p(xj)'s  are  equal  (see,  for  example, 

Reza  QJ)).   Then  p(x  )  =  1/M  for  all  i,  and  by  hypothesis 
M 

E  p(xi>  =  1- 

i=l 
Under  this  constraint,  the  maximum  of  (3)  is 

•-log  J  =  1   lo§  M-  <4> 

n      M     n 

Thus,  if  rate  R  is  defined  as 

R  =  I  log  M,  (5) 

n 

the  desired  maximization  of  average  transmitted  information  is 
achieved. 

With  the  definition  (5),  the  bound  on  minimum  distance  (2) 
may  be  expressed  as  an  explicit  function  of  R,  or  of  M,  in  the 
following  manner: 

d  S  2-   f  1  -  K  +  I  log,  (n  -  k  +  1)  +  1 "]  (6) 

2  L     n   n    *■  n  J 

From  (5), 

R  =  -  In  M  nats, 
n 

where  In  indicates  natural  logarithms,   For  the  group  code, 
M  =  2k  and 
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n 
yielding 


R  =  I  In  2k  =  ii  In  2 


k  „   R  =  In  M  .  _L .  (7) 

n   In  2    n     In  2 


(8) 


The  relation  (7)  is  then  used  in  (6)  to  give 

d<       "     Jin  2   -  i  In  M  +  I  In  2  |"log2    (n-k+1)   +  if 
2    In  2  j  n  n  I 

as  the  desired  bound  on  minimum  distance. 

In  order  to  use  (8)  in  a  geometrical  argument,  Hamming 

distance  d  is  converted  to  Euclidean  distance  D  by  the  relation 

?   2 
d  =  D  /4B  developed  in  Section  II.   The  final  form  of  the  bound 

on  minimum  distance  is  then 


/  ^n  I     ln  2  .  InK  + 


log2  (n-k+1)  +2\\       •    (9) 


■]}■ 


ln  2  ]  n 

The  use  of  relation  (9)  requires  that  proper  interpretation 
be  made  of  it.   The  bound  does  not  guarantee  that  the  minimum 
distance  between  the  code  words  in  a  group  code  can  ever  equal  the 
right  side  of  (9).   Rather,  it  only  guarantees  that  the  minimum 
distance  can  never  exceed  the  right  side  of  (9).   Specifically,  (9) 
says  that  in  every  binary  code,  at  least  one  pair  of  code  words  is 
no  farther  apart  than  the  distance  given.   This  fact  is  used  by 
Shannon  in  the  determination  of  an  upper  bound  on  E  for  the  unre- 
stricted case. 

Assume  a  group  code  with  the  maximum  minimum  distance.   There 
are  two  code  words  no  farther  apart  than  D  as  given  by  the  right  side 
of  (9).   Call  these  two  words  w^  and  w2.   If  one  of  these  words,  say 
w^,  is  transmitted,  and  maximum  likelihood  decoding  is  used,  then  the 
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contribution  to  the  probability  of  error  of  the  code  if  w,  is 
incorrectly  decoded  as  w2  is  at  least  equal  to  the  probability  that 
w  is  carried  at  least  a  distance  D/2  towards  w2,   This  contribu- 
tion can  be  expressed  as 

-  P  (w  moves  at  least  D/2  in  a  specified  direction) 

where  1/M  is  the  probability  that  w  will  be  transmitted.   As 
noted  in  Part  C  of  Section  II,  the  density  function  of  displace- 
ment by  noise  in  any  given  direction  is  one -dimensional  Gaussian 
with  zero  mean  and  variance  N„   The  contribution  to  the  proba- 
bility of  error  is  then  given  by 
D 


M^  | 


ft*pfer2     jln2"^H  +  >2    [>2    (n-k  +  D+2]l 


where 

$  (-x)  =  1  -  $>(x) 

and 


x2 


<£>  (x)  =   J     — — :  e      dx  . 

Since  the  object  here  is  a  lower  bound  on  the  probability  of 

error  P  ,  the  contribution  to  error  if  wi  is  decoded  as  any 

e  L 
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other  word  except  w„  is  neglected.   This  will  result  in  an  opti- 
mistic value  for  P  ,  but  this  is  the  proper  direction  for  reasoning 

e 

to  a  lower  bound. 

Assume  now  that  this  first  word  w..  is  deleted  from  the  code. 
Since  there  are  now  (M-l)  code  words,  the  maximum  minimum  distance 
cannot  be  greater  than  that  which  would  be  possible  in  a  code  con- 
taining (M-l)  code  words,  i.e., 


In  2  -  I  In  (M-l)  +  i"-2- 
n  n 


log2  (n  -  k 


+  1)  +  2]  I    . 

Then,  for  the  pair  with  this  maximum  minimum  distance,  the  argu- 
ment as  used  above  will  yield  a  contribution  to  the  probability  of 
error  of  at  least 

5*  J"  V^T  |ln  2  -  I  1„  (H  -  1)  +  1^  [log,  (»  -  k  ♦  D  +  2]l 

This  process  can  be  continued  for  (M-l)  times,  yielding  a  similar 
contribution  each  time  a  code  word  is  deleted.  The  last  two  code 
words  will  yield  a  contribution  of 


I  $    -~W-L_D (  In  2  •  I  In  2  +_iiL_2  [log   (n  -  k  +  1)  +  2I 

M  ^  I    1/2N  In  2  ]        n         n    I   2 

The  probability  of  error  is  thus  bounded  as 
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Pe^ 


*  (""terr    |ln2.^  +  ^[log2<„-k  +  l)+2]j 


+    $    L~\!      B   "  o        in  2    -  iBJ^lii  +  12-1     log      (n-k  +  i)  +   2 
*     \     V2N   In  2     \  n  n        I       z 

(10) 


+ 


§ 


B  n 


\    «2N   In  2 


In  2 


1"   2   +   ln_2  Flog2    (n  -  k  +  1  )    +  2l  \ 


This  expression  may  be  simplified  somewhat  by  weakening  the  bound. 
Note  that  the  terms  on  the  right  side  of  the  inequality  are  decreas- 
ing in  value  because  each  argument  is  greater  than  the  preceding 
one  and  for  x  >y,  <|>  (-x)<=:<|>  (-y)0   Discard  the  last  (M/2)-l  terms 
and  replace  the  first  M/2  terms  with  the  last  remaining  term.   This 
last  remaining  term  is 


M  Y 


Then 


B  "   Jin  2  -  I  In  M+  ln_2. 
2N  In  2\       n    2    n 


log,  (n  -  k  +  1)  +  2 


P  S  i<£>  ^U^ /in  2  -  1  In  M  +  ln_2  f"log   (n 

e        2  M      \/2N   In  2    J  n  2  nl2 

Feller    (14)    shows   that  <|>  (-x)    is  asymptotic   to 


4 


2 
x 

T 


k  +  1)   +  2 

(ID 

(12) 


x"V27T 
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Since  the  primary  concern  here  is  for  reliability,  which  implies 
very  large  n,  this  asymptotic  result  can  be  used.   If  (12)  is 
applied  to  (11),  there  results 
2 


_B 

4N 


P„  ^ 


k_{l»2-Il„|  +  Jli    [log2    (n-k+l)+2]J 


■^aJc^a.i^e.i^^^.^D^jj 


-,  1 

2" 


(13) 


To  obtain  reliability,    the   definition 

E   =    lira   -   -   In   P 
n  e 

n— »oo 

is  applied   to    (13): 


_    1 


_    1 


MnPS-i 
n  e  n 


>l?%5  il" 2  "  =  ln  "  +  ^K  <-k+1>  +  2] 


4N 


^S—  Jin  2  -  1  In  M  +  ilLlfiog    (n-k+1)   +  2] 
In  2  \  n2  n    (_       2  J 


I  In  P  S   L 


N  ln  2   -  I   In  M  +  I  In  2   +  JjlJ. 
e        4N    ln  2  "\  n  n  n 


+ 


f"log2    (n-k+1)   +  2ll 

1  lnl/4B2"rr  Jin  2   -  I  ln  *  +  i2_2    fiog      (n-k+l)    +  ill 

n    |N  ln  2  J       n    2     n    I   *  ]( 


Note  that  the  expression  I  In  M  in  the  first  term  on  the  right  is 

n 

equal  to  rate  R,   In  the  limit,  those  terms  which  have  as  a  factor 
1/n  will  vanish  to  give 


28 


Jsl1"-^!  <14> 


This  is  the  first  upper  bound  on  E, 

It  is  noted  again  that  the  Plotkin  bound  (1)  is  applicable 
to  binary  codes  in  general  and  thus  to  binary  linear  codes  which 
are  a  subclass  of  binary  codes.   Hence,  the  bound  on  reliability 
given  by  (14)  applies  to  both  and  so  may  be  considered  to  be  a 
somewhat  stronger  result  than  a  bound  for  binary  linear  codes 
only. 

C.   First  Lower  Bound  on  E(R) 

The  first  lower  bound  on  E(R)  will  be  derived  from  an  upper 
bound  on  Pe   The  distance  structure  of  group  codes  is  again  needed, 
but  for  an  upper  bound  on  P  ,  a  lower  bound  on  miraimum  distance  will 
be  required.   Because  of  the  monotone  decreasing  nature  of  the  Gaus- 
sian density  function,  a  lower  bound  on  minimum  distance  in  a  group 
code  will  determine  the  maximum  contribution  to  Pe  from  any  two  code 
words.   The  sum  of  the  contributions  to  Pe  from  all  code  word  pairs 
will  then  yield  an  upper  limit  on  P  ,  provided  that  the  summation  is 
done  in  an  optimistic  direction  for  an  upper  bound.   This,  in  brief, 
is  the  outline  of  the  method  to  be  used  for  the  derivation. 
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The  starting  point  for  the  derivation  of  the  upper  bound 

on  P  is  a  bound  on  minimum  distance  for  binary  codes  first  found 

e  J 

by  Gilbert  (15)  and  refined  by  Varsharmov.   A  convenient  statement 
of  the  bound  for  group  codes  is  given  by  Peterson  (16)  as  follows: 


If 


2n-k22  2l  n-1   . 


where  n  =  total  number  of  digits  per  code  word,  k  =  number  of 
information  digits  per  code  word,  and  H2(x)  =  -  x  log   x 
-  (1  -  x)  log2  (1  -  x) ,  then  there  exists  a  binary  (n,k)  group 
code  with  minimum  Hamming  distance  d." 

The  entropy  function  H(x)  is  symmetrical  in  x  and  (1-x),  i.e., 

H(x)  =  H(l-x)    Then 

I  n  +  1  -  d  \ 
H2  I   n-1   J   "  H2 

and  (1)  can  be  written 


nH2 


d 


2n  -  k  -  2    \n  -  x,  #  (2) 

The  inequality  (2)  is  in  the  wrong  direction  for  the  desired 

lower  bound  on  minimum  distance.   Let  k  be  the  maximum  value  of  k 

m 

for  which  (1)  holds.   Then  there  exists  a  code  with  k   information 

m 

symbols  and  distance  d.   Since  k   is  chosen  to  be  the  maximum,  (1) 

m 

does  not  hold  for  k  =  (k  +  1) .  so 

m 

Zn-dc.+l)^.  2nH2  (glfj   ^  ^ 

The  inequality  (3)  is  now  in  the  desired  direction.   It  can  be 
further  manipulated  to  arrive  at  a  more  sui  able  form  as 
follows: 
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n  -  k  -  1    n»  K-=4 
m        nn2  In  -  1 


or 


2   <:  2 

2k  >  2  ,  (^) 


where  the  subscript  on  k  has  been  dropped   The  theorem  statement 
of  (1)  guarantees  the  existence  of  an  (n,k)  group  code  such  that  (4) 
is  valid.   For  group  codes,  the  number  of  code  words  M  =  2  .   Hence 


«[i--^l]   -1 


d 
where  M  denotes  the  number  of  words  with  minimum  distance  d.   Equa- 
tion (5)  is  interpreted  as  a  lower  bound  on  the  number  of  words  in 
an  (n,k)  group  code  with  minimum  distance  d.   This  bound  may  also 
be  expressed  as  a  lower  bound  on  rate,  since 

r  =  I  In  M 


giving 


nR 
e   =  M, 

which,  when  placed  in  (5),  yields 
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.[i-MH^l]  -> 


enR=.  2  ,  (6) 

from  which  is  obtained  the  inequality 


[>-MfH|   -i]  - 


(7) 


The  quantity  (d  -  2)/(n  -  1)  will  approach  d/n  for  large  n  and  d, 
and  the  factor  l/n  will  become  negligible,  so  that  (7)  may  be 
written 

R=»  [!-■,[£)]  in  2 

or 

R>  ln2  -He  |^|    ,  (8) 

where  H  (x)  indicates  that  the  logarithms  are  to  be  taken  to  the 
e 

base  e.   The  relation  (8)  may  be  expressed  in  terms  of  Euclidean 
distance  as 

.2  \ 


R  =>  In  2  -  H 

e 


(9) 


,  2 
\4B  n 


The  bound  (5)  establishes  that,  in  an  (n,k)  group  code, 

there  are  at  least  M,  words  which  are  separated  by  distance  d 

where  M  is  given  by  the  right  side  of  (5).   An  upper  bound  on  P 
d  e 

for  such  a  code  can  be  obtained  by  adding  up  the  probability  that 
each  code  word  will  be  decoded  as  each  of  the  remaining  code  words. 
Consider  that  word  w  is  transmitted.   Since  maximum  likelihood 
decoding  is  used,  the  probability  of  error  if  w  is  decoded  as, 
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say,  v/2»  is  the  product  of  the  probability  that  w,  is  transmitted 
(1/Md,  since  all  words  are  equally  likely)  and  the  probability  that 
v.  moves  at  least  half  way  towards  W2 .   Since  it  is  known  that  Wo  is 
at  least  D  distance  from  w^,  then  this  latter  probability  is  no 
more  than  <|>  (-D/2VN).    Note  that  it  is  here  that  the  lower  bound 
on  distance  is  used.   The  word  w  could  be  farther  away  than  D, 
but  in  that  event  the  probability  of  w.  moving  halfway  towards  w 
would  be  less.   Hence,  the  most  pessimistic  case  is  taken.   The  con- 
tribution to  P  by  w  being  decoded  as  w9  can  then  be  written  as 
el  *■ 


M  ,  X  \      2YN 
a 

There  are  at  least  M^  code  words,  each  of  which  could  be  paired 
(as  an  ordered  pair)  with  (M^  -  1)  other  code  words.   Hence 


PS  M.  (M,  -  1)  i-    &     | — 


The  inequality  is  preserved  and  simplified  if  this  is  written  as 

P  S  M .  $  (- 2_|   .  (10) 


d        2VN 


Since 

M  =  e 
d 


(10)  may  be  then  written 
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As  noted  earlier,  <^>  (-x)  is  overbounded  by 


i_     e     2 


XV27T 
This  may  be  applied  to   (11)   giving 

-  £ 

p  <:  2VN    1     e«*  e 

which  simplifies  to 

.2 


nR-  °1 
e     8N    .  (12) 


e   V  D27T 

It  is  apparent  from  (12)  that  the  tightest  bound  will  occur  when 
R  as  given  by  the  right  side  of  (9)  takes  on  its  minimum  value. 

The  inequality  (12)  is  an  upper  bound  on  P  with  D  as  a 
parameter.   It  will  be  more  convenient  to  express  D  as  a  function 
of  n  by  use  of  the  relation 

D  ■  X  K(n),  (13) 

where  K(n)  is  the  maximum  value  which  D  can  assume.   This  maximum 
value  can  be  determined  from  the  Hamming-Euclidean  distance,  relation, 

D  =  2B  Yd 
The  maximum  Hamming  distance  for  an  n-digit  code  length  is  thus  con- 
strained to  be  2B"Vn-   However,  in  Section  IV  of  this  paper,  a 
comparison  between  these  results  and  those  of  the  unrestricted  theory 
will  be  made.   In  developing  a  similar  bound  for  the  general  case, 
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Shannon  found  it  expedient  to  restrict  the  maximum  code  word  separa- 
tion to   V2B^n,  which  is  less  than  the  maximum  by  a  factor  of   V2. 
This  restriction  in  effect  is  a  lower  bound  on  rate  and  permitted  a 
simpler  bounding  technique.   In  order  to  make  a  meaningful  comparison, 
the  maximum  code  word  separation  for  the  binary  case  will  be  assumed 
to  be  the  same.   Then,  K(n)  =  V2B2n,  and 

D  =  XV2B^   .  (14) 

The  equation  (12)  can  be  written,  using  (14),  as 

xV 
x  V*B27r 


n  R  - 
P  !=  .L.-V/—S—  e  l      4N  '  .  (15) 


e 

Reliability,  which  is  the  asymptotic  (as  n-»oo  )  value  of  the  nega- 
tive of  the  exponent  in  (15),  is  then  bounded  as 

E  >  AM  .  r    .  (16) 

4N 

This  is  the  first  lower  bound  on  reliability. 

If  the  relation  (14)  is  used  in  (9),  then  R  is  bounded  as 

R  =»ln  2  -  He  (A_|   .  (17) 

The  equations  (16)  and  (17)  can  be  used  to  determine  E(R)  through 
A  as  X    varies  from  0  to  1. 

It  should  be  pointed  out  here  that  the  specialization  of  a 
lower  bound  on  reliability  to  group  codes  is  desirable.   Group 
codes  are  a  sub-class  of  the  class  of  all  binary  codes.   Thus,  if 
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a  group  code  exists  which  will  give  this  reliability,  certainly  a 
binary  code  (the  same  code)  exists  which  will  do  as  well., 

The  bound  obtained  by  the  argument  which  was  used  in  this 
section  is  sometimes  confusing  because  of  the  simultaneous  bound- 
ing of  M  and  P  ,   An  over-all  interpretation  may  be  of  help  in 
understanding  what  has  happened.   The  basic  relation  for  the  bound 

on  P   is 
e 

P  <  M,  (5   f-  -2 1    .  (10) 

e     d     \   2VN  ] 

This  says  that  the  true  value  of  Pe  is  less  than  the  quantity  on 
the  right  and  that  the  quantity  on  the  right  increases  directly  as 
M  .   Call  this  quantity  on  the  right  of  (10)  P  .   Further,  M,  is 


bounded  from  below  by  (5)  as 


Md=»  2 


Call  the  quantity  on  the  right  M  and  note  that  M  is  a  function 

B  g  § 

of  d.   Figure  1  shows  P  plotted  as  a  function  of  M  .   The  point 

8  g 

labeled  d  on  the  curve  in  Figure  1  shows  a  particular  bound  value 

for  P  when  a  value  d   is  used  to  calculate  M  .   Now,  P  5~  P 
g  o  g         e     g 

says  that  the  true  value  of  P  must  lie  below  the  point  d  ,  and 

M .  >  M  says  that  the  true  value  of  M  is  to  the  right   Since 
G     g  d 

the  curve  slopes  up  to  the  right,  the  true  value  definitely  lies 
below  the  curve.   Point  A  is  a  lower  bound  on  M  for  a  fixed  value 
of  Pe,  and  point  B  is  an  upper  bound  on  Pg  for  a  fixed  value 

M,  of  M. 

a 
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D.   Second  Upper  Bound  on  E(R) 

In  the  theory  of  discrete  codes  for  binary  symmetric 
channels,  an  optimum  lower  bound  on  the  probability  of  error  has 
been  clearly  established.   This  bound  is  known  variously  as  the 
Sphere-Packing  Bound  or  Hamming  Bound,   The  bound  is  due  originally 
to  Hamming  (17)  and  has  been  restated  in  more  modern  and  convenient 
form  by  Peterson  (18)  and  Fano  (19) .   Briefly,  the  bound  states  that 
the  probability  of  error  for  the  best  possible  (n,k)  code  can  be  no 
smaller  than  that  for  a  hypothetical  code  called  a  quasi-perfect 
m-error  correcting  code„    A  quasi-perfect  code  is  one  which  corrects 
all  combinations  of  m  or  fewer  errors,  some  of  (m  +  1)  errors  and  no 
combinations  of  greater  than  (m  +  1)  errors    The  terra  hypothetical 
is  used  because  such  a  quasi-perfect  code  may  not  exist  for  all  combi- 
nations of  n  and  k. 

Peterson  (18)  has  stated  the  Hamming  Bound  as  an  upper  bound 
on  minimum  distance  for  an  (n,k)  group  code.   It  should  be  noted  that 
the  bound  actually  applies  to  all  binary  codes.   This  bound  is  given  by 


l-^sMS) 


(i) 


where  n  is  code  word  length,  k  is  the  number  of  information  digits 
in  the  code  word,  m  specifies  the  number  of  errors  which  the  code 
guarantees  to  correct,  and  H  (x)  is  the  binary  entropy  function. 
A  change  of  base  in  logarithms,  plus  the  application  of  the  general 
definition  for  rate,  yields 
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In  2  -  I  In  M  >  H  /£]   ,  (2) 


(5) 


where  M  Is  the  number  of  words  in  the  code 

If  a  code  is  an  ra-error  correcting  code,  then  the  minimum 
distance  is  at  least  given  by 

d  =  2m  +  1, 
which  gives 

d   -    1 


ra  = 


2 
or,    for    large  values  of  d, 


in  =  — 
2 


d        .  (3) 


If  the  Euclidean-Hamming  distance  conversion  is  applied  to  (3), 
the  bound  (2)  may  be  written 


In  2  -  1  In  M  S  H 
n  e 


'  D2  \ 

V  •  (4) 

18B  n/ 


The  argument  to  be  used  for  obtaining  a  lower  bound  on 
error  probability  is  a  geometrical  argument  similar  to  that  used  in 
Part  B  of  this  section.   However,  the  bound  on  distance  (4)  is 
transcendental,  and  no  explicit  solution  for  D  in  closed  form  is 
attainable   Rather  than  attempt  to  solve  (4)  by  approximation  or 
by  numerical  techniques,  an  "inverse  function"  approach  will  be  used. 

From  (3),  the  value  of  the  argument  of  H  is  d/2n,  approx- 
imately.  Since  d  is  Hamming  distance,  the  maximum  value  of  d  is 
n,  i.e. ,  two  code  words  can  differ  at  most  in  every  position.   Thus, 
the  maximum  value  of  the  argument  of  H  is  0.5.   H  (x)  is  monotone 
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Increasing  over  the  range  0  <  x  <  0.5.   Hence,  it  is  single 

valued  over  this  range,  and  an  inverse  function  H_  (x)  can  be 

e 

defined.   H  (x)  is  a  number  y  for  which  H  (y)  ■  x.   Using  this 
e  e 

inverse  notation,  the  bound  (4)  may  be  expressed  as 

<  H   (In  2  -  i  In  M)  (5) 


8B2n 


or  1 

D<  f8B2n  H_1  (In  2  -  I  In  M)~| 


(6) 


Now,  H  '  is  monotone  increasing  because  H  is.   Hence,  as  the  value 
e'  e 

of  M  in  (6)  decreases,  the  bound  value  of  D  will  increase.   There- 
fore, the  same  technique  as  was  used  in  Part  B  of  this  section,  i.e., 
the  determination  of  contributions  to  the  probability  of  error  by 
successive  deletion  of  code  words  at  maximum  minimum  distance  and 
subsequent  modification  of  the  distance  expression  can  be  used 

now  on  (6)  to  obtain  a  lower  bound  on  P  .   These  calculations  are 

e 

straightforward  and  yield  as  a  bound 


e 


>  I  $  I  -~\/—  H"l  (in  2  -  R  +  I  In  2)   1 
~  2  x  I    V  N    e  n        j 


When  the  asymptotic  approximation  of  ^) (-x)  is  applied,  there 
results 

/  XHe1(ln2-R+l   ln  2> 
Pe  S  •  (8) 


V^     H-l<l„2.R+±ln2> 


The  established  definition  for  reliability  will  yield 
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R2   -1 

E  -  m  H«   (ln  2  '  R)    '  (9> 

N   e 

The  relation  (9)  is  the  second  upper  bound  on  E(R).   As  was 
true  in  the  case  of  the  first  upper  bound,  (9)  is  actually 
applicable  to  all  binary  codes. 

E,   Second  Lover  Bound  on  E(R) 

Kennedy  and  Wozencraft  (20)  have  established  a  random  coding 
bound  for  discrete,  meraoryless  channels.   This  bound  is  an  outgrowth 
of  work  by  Fano  and  Gallagher.   It  states  that  it  is  possible  to 
code  and  decode  data  for  such  a  channel  with  a  probability  of  error 
bounded  by 

P  S  2"n(E°  •  R)  (1) 

e  v  ' 

where  n  is  the  code  word  length,  R  is  the  rate  in  bits  per  second, 
and  E  is  the  reliability  obtained  when  only  two  code  words  are  in 
the  message  ensemble  for  the  channel,  i.e.,  M  =  2.   A  brief  outline 
of  the  argument  leading  to  this  bound  will  be  given,  since  the  method 
of  Kennedy  and  Wozencraft  was  adapted  for  the  derivation  of  a  similar 
bound  for  the  time-discrete  continuous  channel. 

Suppose  that,  from  the  set  of  n-tuples  over  the  field  of  two 
elements,  two  code  words  are  chosen  at  random.  Define  the  probabil- 
ity of  error  if  either  of  these  words  is  transmitted  as 

-nEn 
Pe  -  2   °   . 
o 
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Now,  choose  M  words  at  random.   The  probability  of  error  if  the 

first  (or  any  one)  of  these  is  transmitted  is 

P  =  P  (The  received  message  is  closer  to  word  2  than 
word  1,  or,  the  received  message  is  closer  to 
word  3  than  word  1,  or,  •  •  •) 

The  probability  of  a  union  of  events  is  overbounded  by  the  sum  of 

the  probabilities  of  the  component  events.   Hence 

P  <  P  (received  message  is  closer  to  word  2  than  word  1) 

e 

+  P  (received  message  is  closer  to  word  3  than  word  1) 

But  each  component  probability  is  the  probability  than  an  error 
will  occur  in  the  transmission  of  two  randomly  chosen  code  words, 
since  all  code  words  were  chosen  at  random.   Thus 


P  <  (M  -  l)2"nEo  <  M2"nE°  (2) 

Rate  R  in  bits  is  given  by 


R  -  1  log   M. 

n    l 


T^n         nR 

M  =  2 

and  (2)  can  be  written 


Pe^  2 


■n  (EQ  -  R) 


which  is  the  Kennedy-Wozencraft  boundo   The  customary  definition 
of  reliability  gives 

E  2Eo-R.  (3) 
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Note  that  defining  the  probability  of  error  of  transmitting  one  of 
two  randomly  chosen  code  words  as 


-nE- 
Pe   =  e   ° 
o 


will  result  in  the  same  expression,  (3),  for  reliability,  where  R 
will  then  be  in  natural  units, 

A  method  will  now  be  given  for  obtaining  a  bound  on  EQ  which 
will  result  in  a  random  coding  bound  for  the  time-discrete  channel 
with  Has  input  signal  levels.   This  approach  removes  the  restric- 
tion that  the  input  signal  sequences  form  a  binary  linear  code   The 
bound  will  apply  to  the  entire  class  of  binary  codes,  i.e.,  codes 
selected  from  the  complete  vector  space  of  n-tuples  over  the  field 
of  two  elements,   This  method  requires  a  simplified  bound  on  <*  (-x) 
and  an  evaluation,  for  the  time-discrete  channel,  of  the  probability 
of  error  for  two  randomly  chosen  code  words. 

It  can  be  shown  that 


£ 


$(-x)  S  I  e  2  f        0  S  x  Soo  , 


(4) 


in  the  following  manner: 
Let 


x2 


D(x)  =  $(-x)  -  le  2    ; 

then,  the  inequality  (4)  is  valid  if  D(x)  S=  0  over  the  indicated 
range.   By  direct  subsitution,  D(x)  =  0  when  x  =  0,  and  when 
x  =oo  .   Now, 
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dD  .  e  2   /  x  _  1 |   ,  (5) 

dx        I  2    Y2TT  ' 

which  has  only  one  root  at  x  =  2/  ~\/  27T  •   Also,  dD/dx  exists  over 

the  entire  range.   Thus,  D(x)  has  only  one  relative  extremura  and  no 

crossings  between  0  andoo  .   A  test  of  any  sample  value  shows  D(x) 

is  negative  in  the  indicated  range;  hence,  D(x)—  0  for  0<  xSoo  . 

Consider  that  two  binary  code  words  are  chosen  at  random. 

If  these  two  words  differ  by  a  distance  d,  then,  still  assuming  a 

binary  signal  level  +  B,  they  are  separated  by  Euclidean  distance 

D  =  2B"\/d.   If  one  of  these  is  transmitted,  the  probability  of 

error  in  decoding  is  no  greater  than  the  probability  that  one  of 

them  is  moved  by  noise  at  least  D/2,  i.e., 


\ 


s  « [M  *hi*i 


or 


_B2d 

P   *  I  e  "*  ,  (6) 

ed    2 

where  (4)  has  been  used. 

Since  each  code  word  is  chosen  at  random,  the  probability 
of  choosing  any  one  is  2   .   There  are  ( <j  ]   words  which  will  differ 
in  d  places  from  the  first  chosen.   Thus,  the  probability  that  the 
two  code  words  differ  in  d  places  is 


-0 


The  average  probability  of  error  in  two  randomly  chosen  code  words 
is  then  (6)  averaged  over  all  possible  d,  i.e., 
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o 


£  >"n 


d=o 


id        2 


.   B2d 
I  e     TET 


S      „n  +   1      aI± 


IL    /n| 


2N 


o  2 


d=o   ld, 


e    — 
o 


ir^r  li- 


or 


P      S 
e    — 
o 


1   +  e 


.  b£  -n 
2N 


(7) 


By  definition, 


P   =  e 


■nE, 


when  this  is  used  in  (7) ,  there  results 


-nE 


o  <=  I  1  +  e- 


&° 


or 


E  2  -  In  ■  l  +   e 
o  2 


2N 


(8) 


The  inequality  (8)  may  be  applied  in  (3)  to  give  the  second 
lower  bound  on  reliability  as 


E  2 


-  *£ 

h  e  2N 
In   | |   +  R 


(9) 


SECTION  IV 

COMPARISON  WITH  THE  UNRESTRICTED  THEORY 

A.   Preliminary  Remarks 

Prior  to  a  comparison  of  the  results  of  this  investigation 
and  the  unrestricted  theory  of  Shannon  (4),  a  summary  and  explana- 
tion of  the  unrestricted  bounds  is  in  order.   It  should  be  remem- 
bered that  in  the  unrestricted  case  the  channel  input  sequence  is 
considered  to  be  n  real  numbers,  subject  only  to  the  constraint 
that  the  power  P  in  each  sequence  of  n  numbers  be  identical. 

Figure  2  depicts  the  four  unrestricted  bounds  on  reliabil- 

1/2 
ity  plotted  as  a  function  of  rate  R,  with  the  parameter  A  =  (P/N)  ' 

o 
=  3.   The  power  P  will  correspond  to  B   in  the  binary  coding  case. 

N  is  as  before  the  average  noise  power,   For  ease  of  discussion  the 

bounds  have  been  labeled  E^  through  E, „ 

Bound  Ei    is  the  bound  on  reliability  when  P   is  obtained 

by  the  sphere-packing  method.   The  quantitative  expression  for  this 

bound  is  complex,  but  it  will  be  recorded  here  for  interest. 

E,  £  ^ 1  AG  cos  9  -  In  (G  sin  9)  (1) 

1    2    2 

where 


G  =  I  (A  cos  0  +Va2  cos2  e  +  4  )  ^ 

2 

and  0  is  a  function  of  rate  through  the  expression 

e"R  =  sin  9  .  (3) 
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The  bound  (1)  is  valid  for  ranges  of  9  greater  than  9Q,  which  is 
the  value  of  6  which  corresponds  to  channel  capacity,  i.e.,  for 
9  ^  eo  =  sin-1  (1/  Vl  +  A2).   When  9  =  9Q,  then  Ej  =  0,  which  is 
the  correct  value  of  reliability  at  channel  capacity.  El   represents 
the  highest  possible  reliability  attainable  with  the  unrestricted 
coding  scheme.   However,  its  derivation  is  based  on  all  possible 
codes  which  could  be  formed.   It  provides  an  upper  bound  on  relia- 
bility for  all  ranges  of  R,  but  at  low  values  of  R  it  is  found  that 
this  bound  cannot  be  achieved. 

The  bound  E3  is  an  upper  bound  on  reliability  which  is 
sharper  than  EL  at  low  transmission  rates.   E3  is  independent  of 
rate,  and  is  given  by 

E  S  4N    T        •  (*> 

The  lower  bound  E2  is  one  which  is  obtained  when  P   is  cal- 
culated using  a  random  coding  technique.   E2  is  actually  given  by 
two  different  expressions,  according  to  whether  or  not  R  is  greater 
than  or  less  than  a  value  called  critical  rate,  Rc.   The  concept  of 
critical  rate  occurs  in  coding  theory  whenever  general  derivations 
of  upper  bounds  on  Pe  are  obtained  with  a  random  coding  argument. 
It  is  critical  only  in  the  sense  that  the  nature  of  the  bounds  for 

Pe  and  E  are  different  for  ranges  of  R  on  either  side  of  R  .   When 

c 

R  >  Rc,  the  asymptotic  lower  and  upper  bounds  on  Pg  differ  only  by 
a  multiplying  factor  which  is  a  function  of  rate.   Thus,  for 
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R  :=>  R  ,  the  reliability  E,  which  is  an  exponent,  is  the  same  for 
c 

both  bounds  on  P  ,   In  this  range,  then,  E»  is  exactly  the  same  as 
E.  and  is  given  by  (1).   For  rates  less  than  R  ,  E.  and  E  diverge, 
In  this  range,  E   is  given  by 


E2  -  EL(9  )  +  (Rc  -  R) 


(5) 


where 


ET (9  )  =  iL.  -   1  AG  cos  9   -  In  (G  sin  9  ) 


'Lv~c 


2   2 


(6) 


is  the  value  of  E1  at  the  critical  rate,  G  is  given  by  (2),  and 

9  is  a  function  of  critical  rate  through  equation  (3).   The  value 
c 

o 
of  9   is  the  solution  of  2  cos  9  =  AG  sin   9  .   Despite  the 
c  c  c 

complexity  of  the  equations  required  to  determine  ET (9  )  and  R  , 

l,  c       c 

the  bound  E2  is  linear  over  the  range  0  2=  R  S  R  . 

For  low  values  of  R,  E  is  sharpened  by  E, .   The  bound  E, 
is  given  by 


4N 


R 
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where 


R  =  In 


sin  2  sin   — £. 
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and 
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o 

If  B   Is  substituted  for  P,  E,  has  the  same  form  as  the  first  lower 

4 

bound  on  reliability  as  developed  in  Section  III,  Part  C,  of  this 

paper.   However,  the  expression  for  rate  R  is  considerably  different. 

The  curves  in  Figure  2  give  a  typical  picture  of  the  bounds 

obtained  by  the  unrestricted  theory   For  rates  less  than  the  critical 

rate  R  ,  the  bounds  enclose  an  area  within  which  the  reliability 
c 

must  lie.   For  rates  greater  than  R  ,  the  reliability  is  given  by  a 

c 

single  curve.   Thus,  for  rates  near  zero  and  rates  near  to  or  greater 
than  R  ,  the  reliability  is  determined  fairly  sharply. 

All  the  bounds  on  reliability,  as  typified  by  Figure  2, 
are  derived  for  the  best  code  which  the  unrestricted  technique  can 
achieve.   This  fact  requires  that  a  slightly  different  interpretation 
be  made  of  upper  and  lower  bounds.   Bounds  E.  and  E_  say,  in  effect, 
that  regardless  of  what  type  of  code  is  attempted,  within  the 
limits  set  by  the  unrestricted  coding  technique,  no  higher  relia- 
bility can  be  achieved.   The  best  that  can  be  done  is  to  meet  these 
bounds.   Hence,  it  is  expected  that  a  restrictive  coding  technique 
such  as  that  used  in  this  investigation  may  fall  short  of  these 
bounds.   The  lower  bounds  on  reliability,  E2  and  E4,  are  the  result 
of  a  random  coding  technique.   This  technique  results  in  the  following 
reasoning:   assume  that  each  code  is  formed  by  selecting  at  random  M 
code  words  for  each  code  out  of  the  total  ensemble  of  possible 
message  sequences.   The  average  probability  of  error  for  all  such 
possible  codes  is  then  calculated.   Since,  for  all  such  codes,  the 
average  probability  of  error  is  then  known,  the  best  code  must  have 
a  probability  of  error  which  is  at  least  average.   This  is  the  key 
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point  in  random  code  bounding.   It  is  asserted  that  there  exists 
a  code  which  will  give  a  probability  of  error  and  hence  a  relia- 
bility at  least  as  good  as  the  average,   There  is  no  indication  of 
how  this  code  is  to  be  found.   The  binary  technique  is  a  special 
case  of  the  unrestricted  technique.   In  evaluating  reliability  for 
this  case,  the  question  being  asked  is,  in  effect,  is  this  one  of 
the  codes  capable  of  achieving  the  random  coding  bound. 

B„   A  Comparison  of  Upper  Bounds 

Figures  3  through  8  show  the  first  and  second  upper  bounds 
on  reliability  plotted  as  lines  A  and  B,  respectively.   These  bounds 

are  plotted  against  rate  for  various  values  of  the  parameter 

2    1/2 
A  =  (B  /N)    .   The  first  upper  bound  is  given  by 

2 

4  ^    In  2  ;  ' 

The  second  upper  bound  is  given  by 

2   -1 

E<  A  H    (In  2  -  R). 
e 

The  two  bounds  may  be  considered  to  have  different  ranges  of 
validity.   For  R<  0.25  the  first  upper  bound  is  lower  and  hence 
predominates.   For  R  20.25  the  second  upper  bound  is  lower  and 
is  thus  the  prevailing  one. 

For  all  values  of  A,  these  bounds  are  lower  than  the  low 
rate  upper  bound  for  the  unrestricted  case.   For  values  of  A 
greater  than  3,  approximately,  these  bounds  are  also  lower  than 
the  sphere-packing  bound  for  the  unrestricted  case.   It  is  in  this 
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range  that  the  derived  bounds  give  the  greatest  information,   It 
is  seen  that  as  A  increases  beyond  3,  the  reliability  of  the  binary 
code  is  bounded  away  from  the  optimum,  represented  by  the  sphere- 
packing  bound.   The  bounds  show  clearly  that  the  binary  coding 
technique  is  restricted  to  a  maximum  signalling  rate  of  In  2. 

For  values  of  A  less  than  3,  the  upper  bounds  convey  less 
information.   In  this  range  of  A,  the  first  upper  bound  lies  under 
the  sphere-packing  bound  for  low  ranges  of  R  and  intersects  that 
bound  at  a  value  of  R  always  less  than  Rc ,   The  second  upper  bound 
lies  above  the  sphere-packing  bound  except  at  very  low  R  and  at 
R  near  In  2.   The  amount  of  definite  information  given  about  the 
binary  technique  for  small  A  is  less  than  that  available  for 
higher  A 

Where  the  sphere-packing  bound  is  higher  than  the  first  and 
second  upper  bounds,  the  difference  between  them  may  be  used  as 
a  crude  measure  of  the  loss  in  reliability  resulting  from  the  use 
of  the  binary  technique.   It  is  not  too  accurate  a  measure  because 
there  is  no  guarantee  that  the  first  and  second  upper  bounds  can  be 
met,  that  is,  that  a  code  can  be  found  which  will  achieve  this 
reliability,  while  with  the  unrestricted  technique,  the  optimum 
bound  could  be  very  closely  approached,  if  not  actually  reached. 
To  do  so  might  require  that  many  levels  of  signal  quantization  be 
available,  but  this  is,  in  theory,  no  barrier 
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In  all  situations  where  the  first  and  second  upper  bounds 
convey  information,  it  indicates  that  the  binary  coding  technique 
can  give,  at  best,  less  reliability  than  the  optimum  unrestricted 
code   The  degree  of  this  degradation  increases  as  the  signal  to 
noise  ratio  increases.   In  all  cases,  the  situation  is  not  too  sharp 
at  very  low  rates.   Each  figure  shows  that  the  sphere-packing 
bound,  which  is  of  exponential  shape,  is  sharply  truncated  by 
the  low  rate  bound  for  the  unrestricted  case.   The  resulting 
unrestricted  upper  bound  does  not  appear  to  be  a  realistic  or 
naturally  achievable  limit.   A  reasonable  estimate  is  that  the 
true  upper  bound  lies  between  the  first  upper  bound  for  the  binary 
case  and  the  truncated  unrestricted  bound,  in  which  case  the  binary 
case  will  not  be  bounded  as  far  away  from  the  optimum. 

The  second  upper  bound  was  derived  from  the  Hamming  distance 
bound  for  group  codes   A  group  code  for  the  Binary  Symmetric 
Channel  which  meets  the  Hamming  bound  is  optimum,  I.e.,  no  code 
can  be  found  with  the  same  n  and  k  which  has  a  lower  probability 
of  error.   It  has  not  been  established  that  the  second  upper  bound 
is  optimum  for  the  channel  investigated  herein.   However,  the 
second  upper  bound  does  serve  as  a  guide  to  the  performance  to  be 
expected  when  an  optimum  group  code  is  used  in  the  time-discrete 
channel, 
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Co   A  Comparison  of  Lower  Bounds 

1.   The  First  Lower  Bound 

The  first  lower  bound  is  plotted  as  curve  C  on  Figures  4 
through  8.   The  bound  was  derived  in  terras  of  minimum  distance  D 
between  code  words,  in  the  same  manner  as  was  the  bound  for  the 
unrestricted  case.   The  expression  for  E  is  the  same  for  both 
and  is 

4 
where 

A  = 


Rate  R  is  a  function  of  X  »  but  the  function  is  not  the  same  for  the 
binary  and  unrestricted  cases.   For  the  binary  case,  R  is  given  by 


A2' 


R  2  h  2  -  He 


This  can  be  compared  with  R  for  the  unrestricted  case  as  given  in 
Part  A  of  this  section.   To  present  E  as  a  function  of  R,  several 
auxiliary  curves  were  used,   Figure  9  plots  rate  as  a  function 
of  A  for  both  cases.   Figures  10,  11,  and  12  give  E  as  a  function 
of  X.    for  various  values  of  A. 

The  first  lower  bound  lies  slightly  under  the  low  rate 
bound  for  the  unrestricted  case.   For  lower  values  of  A,  it  coin- 
cides with  the  unrestricted  bound  at  low  rates.   It  would  be 
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unreasonable  to  expect  that  the  binary  coding  technique  could  give 
a  higher  reliability  than  that  of  the  unrestricted  case.   It  must 
be  remembered  that  the  binary  case  is  only  a  special  case  of  the 
unrestricted  coding  scheme,   If  the  binary  technique  could  guarantee 
that  codes  could  be  found  whose  reliability  was  no  worse  than 
some  value  K(R) ,  and  K(R)  was  better  than  the  corresponding  bound 
for  the  unrestricted  case,  then  the  unrestricted  bound  could  be 
raised  to  match,  since  the  binary  group  codes  are  one  of  those  avail- 
able in  the  unrestricted  case.   It  is  felt  by  the  author  that  the 
close  proximity,  at  low  rates,  of  the  lower  bounds  of  the  binary  and 
unrestricted  schemes  is  significant  when  the  highly  restrictive 
nature  of  binary  coding  is  compared  with  the  unlimited  numbers  of 
codes  available  in  the  unrestricted  case. 

1.   The  Second  Lower  Bound 

The  second  lower  bound  is  plotted  as  line  D  on  Figures  4 
through  8.   This  bound  lies  under  the  unrestricted  random  coding 
bound  for  all  values  of  A.   Its  separation  from  the  unrestricted 
bound  increases  as  A  increases.   For  small  A,  it  approaches  the 
unrestricted  bound  quite  closely.   The  departure  of  the  second 
lower  bound  from  the  unrestricted  random  coding  bound  can  be  used 
as  a  much  better  measure  of  the  loss  of  reliability  by  binary  coding 
than  could  the  first  upper  bound.   The  unrestricted  random  coding 
bound  says  that,  from  all  the  codes  available,  there  can  be 
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selected  one  or  more  which  can  have  no  lower  reliability  than  is 
given  by  the  bound.   The  second  lower  bound  guarantees  that 
among  all  the  binary  codes  available  one  or  more  exists  which  can 
have  no  lower  reliability  than  that  given  by  the  bound.   Since  these 
bounds  are  not  widely  divergent,  and  in  fact,  are  quite  close  at 
the  lower  values  of  A,  the  curves  show  that  binary  codes  can  be 
found  which  are  not  significantly  worse  than  the  average  code  for 
the  unrestricted  case. 


SECTION  V 


CONCLUSIONS 


A.   Summary  of  the  Investigation 

The  stated  purpose  of  this  investigation  was  to  inquire  into 
the  use  of  binary  codes,  particularly  binary  linear  codes,  as  input 
signal  sequences  for  a  time-discrete  continuous  channel.   More 
specifically,  it  was  desired  to  determine  if  binary  codes  as  a 
class  were  so  inferior  in  performance  that  a  different  technique  of 
coding  should  be  sought  or  if  their  performance  approached  the  unre- 
stricted codes  closely  enough  to  warrant  a  more  detailed  study.   The 
method  of  this  inquiry  was  to  determine  the  reliability  of  such  codes 
for  this  channel  and  to  contrast  this  with  the  reliability  derived  for 
the  more  general  unrestricted  case  by  Shannon.   The  reliability  deter- 
mination was  done  by  the  derivation  of  two  upper  and  two  lower  bounds 
on  reliability  for  the  binary  coding  technique. 

The  first  and  second  upper  bounds  demonstrate  that,  for  a 
substantial  range  of  the  signal  to  noise  ratio,  the  upper  limit  of 
reliability  of  the  binary  coding  technique  is  bounded  below  the 
optimum  reliability  for  the  unrestricted  case.   This  is  not  an  unex- 
pected result   The  value  of  the  upper  bounds  is  that  they  are  an 
analytical  verification  of  the  expected  result.   The  plotted  curves 
show  that  the  binary  method  is  bounded  away  from  optimum  by  a 
fairly  large  amount  in  most  cases, 
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The  lower  bounds  on  reliability  are  the  most  surprising  and 
significant  results  of  the  investigation.   The  first  lower  bound  is 
valid  for  low  rates  of  information  transmission.   This  bound  at  worst 
lies  only  slightly  below  the  corresponding  bound  for  the  unrestricted 
case.   It  guarantees  that  binary  linear  codes  exist  which  give,  at 
worst,  a  reliability  only  slightly  less  than  that  guaranteed  for  the 
unrestricted  case.   Any  lower  bound  on  reliability  is,  in  a  sense,  the 
only  positive  statement  which  can  be  made  about  the  merit  of  a  code. 
The  close  proximity  of  the  lower  bounds  for  the  binary  and  unre- 
stricted cases  indicates  that,  by  the  use  of  binary  linear  codes,  a 
value  of  reliability  can  be  guaranteed  which  is  only  slightly  worse 
than  that  which  is  guaranteed  if  the  entire  ensemble  of  codes  avail- 
able in  the  unrestricted  case  is  used. 

The  second  lower  bound  is  valid  over  higher  ranges  of  rate 
than  is  the  first  lower  bound.   It,  too,  lies  beneath  the  unrestricted 
bound.   The  divergence  of  the  two  bounds  is  somewhat  greater  than  is 
the  case  for  the  first  lower  bound   The  statements  about  the  first 
lower  bound  apply  here  also,  except  that  the  difference  in  guaran- 
teed reliability  between  the  binary  and  unrestricted  cases  is  greater 
over  the  range  for  which  the  second  lower  bound  is  valid.   It  should 
also  be  emphasized  that  the  second  lower  bound  is  not  derived  for 
binary  linear  codes,  but  rather  for  binary  codes  in  general    It  is 
the  only  one  of  the  bounds  obtained  which  has  this  exception. 
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Overall,  the  use  of  binary  codes  appears  to  be  a  promising 
method  of  coding  for  the  time-discrete  continuous  channel,  provided 
the  signal-to-noise  ratio  is  not  too  large.   As  indicated  by  the 
various  figures,  the  bounds  on  reliability  for  the  binary  case  become 
increasingly  divergent  from  those  of  the  unrestricted  case  as  the 
signal  to  noise  ratio  increases.   For  larger  values  of  A,  non-binary 
discrete  codes  would  probably  yield  results  much  closer  to  those  of 
the  unrestricted  case. 

The  bounds  derived  for  this  investigation  are  the  only  indi- 
cations available  (to  the  best  of  the  author's  knowledge)  of  what 
is  possible  when  a  class  of  discrete  codes  is  used  in  a  time-discrete 
continuous  channel.   It  is  reasonable  to  expect  that  when  the  input 
signals  are  restricted  to  a  comparatively  tiny  subset  of  the  total 
ensemble  of  codes  available  to  the  channel,  the  reliability  will  be 
decreased.   It  is,  however,  encouraging  that  the  lower  bounds  for  the 
binary  coding  technique  can  guarantee  at  lower  signal-to-noise  ratios 
a  reliability  which  is  close  to  that  guaranteed  by  the  unrestricted 
codes,  over  a  reasonably  wide  range  of  transmission  rates. 

B.   Suggestions  for  Future  Work 

This  investigation  has  shown  that  the  use  of  binary  codes  as 
an  encoding  technique  for  the  time-discrete  channel  is  promising.   It 
can  be  reasonably  conjectured  that  even  better  results  could  be 
achieved  by  the  use  of  non-binary  discrete  codes,  which  would  make 
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available  additional  levels  of  signal  quantization.   The  possible 
problems  for  future  work  are  numerous.   They  fall  rather  naturally 
into  two  groups,  the  continuation  of  research  in  the  vein  of  this 
paper  and  studies  directed  toward  the  application  of  discrete  codes 
to  actual  information  handling  systems. 

The  first  theoretical  investigation  might  well  concern  itself 
with  a  sharpening  of  the  bounds  for  the  binary  case.   A  most  informa- 
tive result  would  be  an  optimum  or  least  upper  bound   This  would  pin 
down  the  maximum  capability  of  this  type  coding.   The  lower  bounds 
herein  were  obtained,  in  some  cases,  by  comparatively  crude  techniques 
It  is,  however,  questionable  how  much  of  an  improvement  can  be  obtained 
in  any  reasonable  fashion. 

A  most  fruitful  area  for  future  work  would  be  the  use  of  non- 
binary  linear  codes,  i.e.,  codes  from  ternary,  quarternary,  and  in 
general,  q-ary  systems.   As  additional  levels  of  signal  quantization 
are  made  available  the  bounds  on  reliability,  and  in  particular  the 
upper  bounds,  would  be  expected  to  approach  the  unrestricted  case, 
since,  in  effect,  an  infinite  number  of  signal  levels  are  available 
in  the  unrestricted  case„   A  measure  or  technique  for  determining 
how  many  levels  are  required  to  approach  the  unrestricted  bound  to 
within  some  specified  interval  would  be  the  ultimate  result  one 
might  expect. 

When  the  actual  application  of  discrete  codes  is  considered, 
several  problems  are  immediately  apparent  which  warrant  study.   The 
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need  for  decoding  methods  is  clear,  and  an  investigation  to  determine 
them  would  be  challenging.   The  implementation  of  a  maximum  likeli- 
hood decoder  does  not  appear  to  be  practical.   On  the  other  hand, 
digit-by-digit  decoding  will  penalize  the  reliability.   Is  there  an 
effective  compromise?   Finally,  in  the  application  of  discrete  codes, 
some  code  must  be  selected.   An  investigation  of  specific  codes  for 
use  in  this  application  would  be  informative. 
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Figure  1.   Interpretation  of  First  Lower  Bound 
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APPENDIX 


ALLIED  RESULTS 


A.   Minimum  Power  Theorem 


Suppose  that  there  exists  a  set  of  M  signals,  each  of  which 
is  represented  as  an  n-dimensional  vector  in  n-space  in  a  primary 
coordinate  frame  S.   Thus  the  i   signal  will  be 

xi  =  xil»  xi2'       xij'    '    Xin 

Each  x. .  is  a  voltage,  which  could  be  derived  from  some  sampled  value 
of  a  more  complex  signal  representation.   For  simplicity,  assume  that 
each  signal  component  x^*  has  a  duration  of  1  second.   The  power  in 
the  i  n   signal  is,  with  respect  to  frame  S,  equal  to 

,    n    2 

The  average  power  P  of  the  ensemble  of  M  signals  is  then 


M 
P  = 


£  MiE  <j) 


where  p   is  the  probability  that  the  ith  signal  will  occur,  and 
M 
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If  the  signal  points  remain  fixed  with  regard  to  the  primary 
coordinate  frame  S,  then  the  average  power  will  be  a  function  of  the 
position  of  the  coordinate  system  in  which  the  power  is  calculated, 
for  assume  that  the  power  is  calculated  with  respect  to  a  new  coor- 
dinate system  S  centered  on  K  =  k.   k_   •  •  •  k    Then 

1.2,  n 

M  n 

p-   E   ^  (i  E   »'2 

1-1         '  "    £i      'J 


where 


thus 


Xu  -  (xij  -  V  • 


M 


£"l:  E  Wj-V2) 

11  J-l  / 


(1) 


which  is  clearly  a  function  of  K. 

We  wish  to  know  if  there  exists  a  meaningful  value  of  K 
which  will  minimize  P.   Consider  a  mechanical  analogy   Let  each 
signal  vector  Xi  be  the  radius  vector  to  a  point  of  mass  m  ,  where 

r  i 

^mt  -  M,  and  p£  =  n^/M.   Then,  the  center  of  mass  of  the  system 
of  mass  points  is  given  by 

M 

E 


m  x. 

i=l   1  X 
R  = 

M 


(2) 


Thus  R  =  rlf  r  .  •  ■  •  r   .    .  r 

L  J         n 
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where 

1   M 
J   M  M   i  ij 

or 

M 

r  -  T,      P  *     •  (4) 

Taking  the  derivative  of  (1)  with  respect  to  K,  we  get 
M           n 

dP/dK  =2   P<  t1/"  £   (-2)(x_,.  -  k  ))  since  p   is  not  a  func- 
i=l          j-1        iJ    J  i 

-  2-2 

tion  of  K.   The  second  derivative  d  P/dK   is  positive,  hence 

dP/dK  ■  0  will  yield  a  minimum.   Then 
M  n 


■2   51   pi  (  i  E   (x..  -  k  ))  =  0 
t-l   i   n  .^    ij    j 


which  requires 
M 


;  E  p'  E  uij "  V  ■  ° 
i-i     j-i 


and 


M     n 

E     E  Pi^n    V  "° 

i=i    j=i 


iJ    J 


Interchanging  the  order  of  summation,  there  results 

n     M 

E    E  P^ij  "  kj>  -  o       • 
j=l   i=l 

This  will  be  satisfied  if 

M 

E   PiK,  -O  -0  (5) 

i=l      1J    J 

for  each  j.   Since  the  sum  is  on  i,  (5)  may  be  written 
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M  M 

E  pixij  •  E   pikj  c  °  • 

i-i  t-i 

M 

Recall  that   y  p.  =  1,  thus  the  condition 


& 


M 


: .  ,    i  *j       j 


(6) 


i=l 
for  all  j  will  minimize  the  average  power  in  the  signal.   But  (6) 

is  equivalent  to  (4).   Thus,  the  average  power  will  be  minimized  if 

the  coordinate  system  wherein  power  is  calculated  has  its  origin 

at  the  center  of  mass  of  the  signal  points. 

In  the  investigation  reported  in  the  body  of  this  paper,  it 
has  been  assumed  that  all  signals  (code  words)  are  equally  likely, 
i.e.,  P.  =  1/M.   By  construction,  each  signal  is  in  one-one  corre- 
spondence with  a  word  of  a  group  code  (linear  code)  and  the  ensemble 
of  M  signals  has  constituted  a  complete  group  code.   It  will  now  be 
shown  that,  under  these  constraints,  the  origin  of  coordinates 
(0,  0,  '  '  '  0)  of  these  signals  is  at  the  center  of  mass  of  the 
signal  points  and  hence  minimum  average  power  is  required. 

If  the  center  of  mass  of  the  system  of  signal  points  is  to 

coincide  with  the  origin  of  coordinates,  then  R  =  0,  or,  from  (4) 

M 

r .  ■  )       pjX   =  0 
j    <-*     Vi   ij 

i=l 

for  all  j.   By  hypothesis,  p.  =  1/M,  requiring 


M 

i 
tj 


1  y 


i=l 
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or 

=  0  .  (7) 

i=l 


E   x 


Since  x   can  assume  only  the  values  +  B,  then  (7)  will  be  satisfied 
ij 

if  and  only  if  the  positive  and  negative  values  each  appear  M/2 
times  in  the  j   component  of  the  signal. 

We  will  use  the  notation  of  linear  codes   Consider  an  (n,k) 
linear  code  over  the  field  of  two  elements.   It  is  such  a  code  which 
has  been  used  in  forming  the  signal  vectors,  by  mapping  1  into  B  and 
0  into  -B.   Hence,  any  property  derived  for  the  (n,k)  code  will  be 
valid  for  the  signals  used  in  this  investigation. 

This  (n,k)  linear  code  contains  M  =  2   n-tuples  over  the 
field  of  two  elements,  and  these  n-tuples  form  a  subspace  V  of  the 
vector  space  of  all  n-tuples  over  the  field  of  two  elements.   Arrange 
these  n-tuples  as  rows  of  a  matrix.   No  column  of  this  matrix  contains 
all  zeros,  for  if  such  a  column  existed,  it  could  be  deleted  and  an 
(n-l,k)  linear  code  would  remain. 

Consider  the  subset  of  vectors  of  V,  v.,  v.,  •  ■  •  =  S, 
which  have  a  0  in  the  j   column.   Now  S  is  a  subspace  of  V  since  S 
is  closed  under  addition  (the  sum  of  any  two  vectors  with  0  in  the 
j   column  will  be  a  vector  with  0  in  the  j   column)  and  closed  under 

multiplication  by  scalar  field  elements  (any  vector  which  has  a  zero 

t  Vi 
in  the  j   component  will  retain  a  zero  in  that  component  when  multi- 
plied by  any  scalar.)   Since  S  is  a  subspace  of  a  vector  space  V, 
S  is  also  a  vector  space  and  is  an  Abelian  group  under  addition. 
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Hence,  we  may  form  left  cosets  based  upon  S  as  a  subgroup   Selecting 
a  member  of  V  which  contains  a  1  in  the  j   position,  we  can  form  a 
left  coset.   Now,  all  members  of  V  which  contain  a  1  in  the  jtn 
position  must  appear  in  this  coset,  since  two  elements  v  and  v'  of  a 
group  V  are  in  the  same  left  coset  of  a  subgroup  S  of  V  if  and  only 
if  (-v)  +  v'  is  an  element  of  S,  and  the  sum  of  any  two  vectors  with 
a  1  in  the  j   position  yields  a  vector  with  a  0  in  the  jth  position. 
This  theorem  is  proved  by  Peterson  (21)    Since  the  vectors  of  V  can 
have  only  a  1  or  0  in  the  j   position,  this  partitioning  exhausts  V 
and  divides  the  space  into  a  subset  containing  all  O's  in  the  j 
position  and  another  subset  containing  all  l's  in  the  j    position.   By 
construction,  each  subset  has  an  identical  number  of  elements.   Thus, 
each  must  contain  M/2  vectors.   Consequently,  in  a  linear  code,  l's 
and  O's  occur  M/2  times  in  each  component  of  the  vector  ensemble.   Thus, 
if  a  linear  code  is  mapped  into  a  code  containing  components  of  +  B 
in  each  vector,  (7)  is  satisfied.   Hence,  minimum  power  is  required. 

B.   An  Upper  Bound  on  Reliability  for  the  Binary  Symmetric  Channel 

It  was  felt  that  the  technique  used  in  the  derivation  of  the 
first  upper  bound  in  Section  III,  Part  B,  of  the  body  of  this  paper, 
might  yield  a  comparable  result  for  binary  codes  used  in  the  binary 
symmetric  channel  (BSC) .   The  resulting  bound  does  prove  to  be  lower 
than  the  sphere-packing  bound  for  low  transmission  rates. 

The  Plotkin  bound  on  maximum  minimum  distance  guarantees  that, 
for  an  (n,k)  code,  there  exists  at  least  one  pair  of  words,  w,  and  W2, 
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which  are  no  farther  apart  than  the  bound  value  of  d,  which  is  given 
by 

d  <  |  ["  1  -  R  +  I  log  (n  -  k  +  1)  +  |J   ,  (1) 

where  R  =  k/n  =  rate  in  bits,  and  logs  are  to  the  base  2.   Assume 
that,  in  a  BSC,  one  of  this  pair  which  has  maximum  minimum  distance, 
say  Vi,  is  transmitted.   Then  the  contribution  to  the  probability 
of  error  of  the  code  is  certainly  no  less  than  the  probability  that 
w^  is  mistaken  for  W2 ,  weighted  by  the  probability  that  w^  is  trans- 
mitted.  Since  all  words  are  equally  likely,  the  probability 

-k 
that  w,  is  transmitted  is  2   .   The  word  w,  will  be  mistaken  for  w~ 

if  at  least  d/2  of  the  d  places  in  which  they  differ  are  changed. 

If  the  BSC  has  an  error  probability  of  p  and  a  probability  of  correct 

transmission  of  q  =  (1  -  p),  then  the  contribution  to  the  probability 

of  error  of  the  code,  P  ,  is  bounded  by 


d 


•k 


p  q  (2) 


Now,  discard  w  ,  since  an  expression  bounding  its  contribution  to  the 
P  has  been  obtained.   The  remaining  (2   -  1)  words  must  contain  a 
pair  which  is  no  farther  apart  than  the  bound  value  of  d1 ,  which  is 
given  by 


I  S  S  [  x  .  I  log  (2   -  1)  +  I  log  (n  -  k  +  1)  +  ^  ] 
1    2  [_     n  n  n  J 


(3) 


79 


Similarly,  if  one  of  this  pair  is  transmitted  and  decoded  as  the 
other,  it  will  yield  a  contribution  to  the  probability  of  error 
of  at  least 


°1 
2 


i    P  q 


This  can  be  continued  until  only  two  words  remain.   This  last  con- 
tribution will  be 
d„ 


d 
2 


■k      /do\ 


p    q 


i=do 


where 


do~     l\l   "  i  +  i   loS    <n   "   k  +   l>    +~1       • 
°  z  L  n        n  n  J 

If  the  contributions  from  all  these  pairs  are  summed,  there 


results  a  lower  bound  on  P  given  by 

e 


Pe  -   2 


+ 


i3  I* 


c 
2 

d 


+ 


^ 


i   d 

p    q 


i+  E 

2 


/d  |   i  dx  -  1 

1  p  q 

i 


(4) 


This  bound  can  be  simplified.   First,  since  there  are  (2   -  1) 
terras  on  the  right  of  (4),  the  inequality  is  preserved  if  we  retain 

k  -  1 

only  the  first  (2      +  1)  terras.   Each  terra  on  the  right  of  (4) 


can  be  shown  to  be  smaller  than  the  preceding  term.   Hence,  these 
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(2  +  1)    remaining   terms   can  be   replaced  by  the    (2  +   1) 


terra,    giving 


P  5 

e 


i  £ 


i=d  \i 

2 


Ip1^"1   , 


(5) 


where  now  d  is  given  by  the  right  side  of  the  relation 


2  L     n 


log  2R  "  l   +  I  log  (n  -  k  +  1)  +  11 
n  n  J 


or 


d  <  Hf  1  -  R  +  I  log  (n  -  k  +  1)  +  11 
2  L        n  n  J 


(6) 


The  inequality  (5)  is  preserved  and  somewhat  simplified  if  it 
is  written 


»■  M    E 


e    2 


i  d  -  i 

p    q 


(7) 


1-i   I1 


Reliability  E  is,  from  (7) 


E  S  lira  -  i   logf-   V" 
n-*««  n      L  2  L> 


n  -*oo 


-! 


'd 


i  d 

p  q 


l]- 


or 


E  S 


lira  • 
n-»oc 


it 


1  +  log 


d 


2 


/d 


1 1 


i   d  -  i 

p    q 


(8) 


The  evaluation  of  (8)  requires  that  an  estimate  be  made  of 
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the  "tail"  of  a  binomial  distribution.   Peterson  (22)  gives  the 
necessary  relations.   Write  d/2  as  Ad,  where  A  =  1/2,      The  sum  in 
the  inequality  (8)  may  then  be  written 

d       'd»    i  d  -  i 


£     U  p  q 

i=  Ad 
and   the   inequality 

Jt        /d)      „i      d   "    *  «-         ^_Xd  "/<d  xd       /«d 

7]  p    q  ^      A         /<  p       q  (9) 

i-  Ad    \1' 

can  be  used,  provided  X>p,  Here  (  A  +  **.)  =  1 .  As  the  code  word 
length  n  increases,  d  will  also  increase.  Peterson  (22)  shows  that, 
for  large  d, 


d 
where 


_   -  Ad   -/id   Xd  /td-. 
A  log    A     /*.     p    q      -  F(X,  p) 


(10) 


F(A,  p)  =  H(p)  -  H(A)  +  (  A  -  P)  H'(p),  (11) 

H(x)  =  -x  log  x  -  (1  -  x)  log  (1  -  x), 
and 

H'(x)  =  log  l1"^-5)   • 

Thus,  using  (10),  the  bound  (8)  is  written 

e^  lim  r  i  +  d  F(X>  p)  -I      § 

n-»oo  Inn  J 

But,    from   the   Plotkin  bound    (6), 
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i<  II"  l  -  R+l  log  (n  -  k  +  1)  +  ^  1  . 
n    2  L         n  n  J 

Also,  recalling  that  X=  1/2,  there  results,  In  the  limit, 

E  S  F^°-5»  P>  (1  -  R)    .  (12) 

The  optimum  or  sphere-packing  bound  for  the  binary  symme- 
tric channel  is  due  to  Elias  (23) .   It  is  given  by 

E  S  F(X  ,  p)  (13) 

o 

where  F(X  i  p)  is  given  by  (11),  and  X   is  defined  by  the 
o  o 

expression 

1  -  H(X  )  =  R   .  (14) 

o 

Figure  13  shows  a  typical  plot  of  the  bound  derived  here,  as  given 

by  (12),  and  the  sphere-packing  bound  as  given  by  (13),  for 

p  =  0.01.   The  bound  (13)  is  lower  than  the  sphere-packing  bound  for 

low  transmission  rates  R. 

Since  the  Plotkin  bound  (1)  is  applicable  to  binary  codes  in 

general,  the  results  obtained  here  for  P  ,  expression  (7),  and  E, 

expression  (12),  are  valid  for  the  entire  class  of  binary  codes. 

Weldon  (24)  has  obtained  low  rate  results  for  the  BSC  which  are 

applicable  to  group  codes  only.   His  result  for  E  is  the  same  as 

(12),  while  his  result  for  P   is  higher  than  (7)  by  a  factor  of  2, 
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